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Abstract— Challenging manipulation tasks can be solved 
effectively by combining individual robot skills, which must 
be parameterized for the concrete physical environment and 
task at hand. This is time-consuming and difficult for human 
programmers, particularly for force-controlled skills. To this 
end, we present Shadow Program Inversion (SPI), a novel ap- 
proach to infer optimal skill parameters directly from data. SPI 
leverages unsupervised learning to train an auxiliary differen- 
tiable program representation (“shadow program”) and realizes 
parameter inference via gradient-based model inversion. Our 
method enables the use of efficient first-order optimizers to 
infer optimal parameters for originally non-differentiable skills, 
including many skill variants currently used in production. 
SPI zero-shot generalizes across task objectives, meaning that 
shadow programs do not need to be retrained to infer pa- 
rameters for different task variants. We evaluate our methods 
on three different robots and skill frameworks in industrial 
and household scenarios. Code and examples are available at 


https: //innolab.artiminds.com/icra2021 


I. INTRODUCTION 


Combining individual robot skills to solve complex tasks 
has established itself as one of the primary programming 
paradigms in robotics. A large variety of skill variants such as 
Dynamic Movement Primitives (DMPs) [1], Task and Motion 
Planning (TAMP) operators [2] or generalized manipulation 
strategies [3] have been proposed, all of which allow the 
behavior of skills to be adapted to the task at hand by a 
set of skill parameters such as velocities, scaling factors or 
via-points. Finding appropriate values for these parameters is 
difficult and typically requires manual tweaking. Consider a 
force-controlled spiral search skill, where the robot executes 
a spiral motion until a drop in forces at the tool center 
point (TCP) indicate that a hole has been found. The op- 
timal parameters, i.e. spiral orientation and extents, velocity, 
acceleration and pushing force, to maximize the likelihood 
of finding a hole without sacrificing too much time, depend 
on the probability distribution of the hole position and the 
physical properties of the surfaces. For a human programmer, 
tuning these parameters involves much trial and error. 
Differentiable programming (OP) provides an elegant so- 
lution: If a program is fully differentiable, it allows for 
the gradient-based optimization of program parameters with 
respect to nearly arbitrary objective functions over the pro- 
gram’s outputs [4], [5]. However, most skill libraries used in 
practice are not differentiable. Particularly force-controlled 
skills such as spiral search or moment-free insertion are often 
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Fig. 1: Parameter inference via Shadow Program Inversion. 
By inverting learned differentiable shadow models (right) of 
a sequence of robot skills (left), optimal program parameters 
can be inferred directly from data. 
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implemented using highly performant but non-differentiable 
force controllers provided by robot manufacturers. 

In this paper, we present Shadow Program Inversion (SPI), 
a novel OP-based method of inferring optimal parameters 
for robot programs of non-differentiable skills. We propose 
to learn a differentiable surrogate (called shadow model) 
of each skill, which is trained via unsupervised learning to 
predict the expected trajectory (TCP poses and wrenches) 
when executing the skill for a given set of parameters. Just 
like skills can be chained to solve complex tasks, their 
shadow models can be chained to form a differentiable 
shadow program. We then apply a gradient-based neural 
network inversion technique to the shadow program to jointly 
infer the skill parameters which maximize a set of task 
objectives. By maintaining a one-to-one relationship between 
the original robot skills and their differentiable shadows, the 
optimized program parameters can be transferred back to the 
original skills, which are used for execution on the robot. 
By conducting gradient-based parameter inference over dif- 
ferentiable surrogates rather than the actual skills, our 
method can be applied optimize parameters for widely used 
skill representations such as DMPs, which do not have to 
be differentiable. Moreover, it generalizes in a zero-shot 
manner across task objectives, avoiding retraining when task 
objectives change. We demonstrate the broad applicability 
of our approach on three use cases from industrial and 
household robotics, involving diverse skill representations 
ranging from high-level generalized manipulation strategies 
to low-level DMPs and the URScript robot API [6]. To 
our knowledge, SPI is the first application of gradient-based 
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model inversion to robot skill parameterization. 
Our main contributions can be summarized as follows: 


1) Shadow models, differentiable surrogate representa- 
tions of possibly non-differentiable skills. 

2) Shadow programs, end-to-end differentiable represen- 
tations of complete skill-based robot programs. 

3) Shadow Program Inversion, an algorithm to effi- 
ciently infer optimal skill parameters via gradient- 
based model inversion. 

4) Evaluation of Shadow Program Inversion on three 
different robots and real-world use cases. 


II. RELATED WORK 


1) Robot skill parameter inference: Several approaches 
for the automatic optimization of robot skill parameters 
have been proposed, which rely on gradient-free optimiza- 
tion techniques such as evolutionary algorithms [7], [8], 
[9] or Bayesian optimization [10], [11], [12] due to the 
non-differentiability of most skill libraries and frameworks. 
Gradient-free approaches require frequent execution of the 
skills during optimization, which is a time-consuming pro- 
cess if done on real robot systems, has to be repeated 
whenever the task objectives change and often require good 
initial parameterizations. We propose to use a gradient-based 
optimizer on a differentiable surrogate model of the skills, 
which avoids these issues. 

2) Unsupervised representation learning: Recent robot 
learning approaches such as Visual Foresight [13], Adver- 
sarial Skill Networks [14], Time Reversal [15] or “Learning 
from Play” [16] propose self-supervised or unsupervised 
learning of a predictive model, i.e. a model of skill inputs 
to a latent trajectory representation, from which a policy to 
solve the task is then derived. We take a similar approach to 
parameterize skill-based robot programs by learning a differ- 
entiable skill model and exploit its differentiability to infer 
optimal program parameters. Like most approaches relying 
on self-supervised representation learning, ours generalizes 
across task variants without requiring additional training. 

3) Differentiable programming: Differentiable program- 
ming (OP) proposes to express programs as differentiable 
computational graphs, which permit the optimization of 
program parameters via reverse-mode automatic differenti- 
ation and gradient-based optimization [5], [4]. From a OP 
perspective, neural networks can be considered a type of 
differentiable program, and can be combined to computa- 
tional graphs-of-networks [17], [18], [19] as well as hybrid 
architectures combining neural networks with differentiable 
hand-written algorithms or data structures [20], [21], [22], 
[23], [24]. In the domain of robotics, OP has been realized in 
the form of hybrid skill representations such as Conditional 
Neural Movement Primitives [25], [26], Deep Movement 
Primitives [27] or Differentiable Algorithm Networks [28], 
which combine differentiable algorithmic priors with neural 
networks and can be combined to complex robot programs in 
a modular fashion. Like most prior work on OP in robotics, 
we provide modular interfaces to combine differentiable 


functional blocks to complex programs, and combine hand- 
written computational graphs with neural networks to sim- 
plify the learning problems. In contrast to prior work, how- 
ever, we rely exclusively on unsupervised learning, which 
greatly simplifies data collection and model training. In work 
similar to ours, Zhou et al. [29] propose to generate DMP 
parameters directly with a Mixture Density Network. We 
instead propose to learn a differentiable model of the DMP 
and to infer the optimal skill parameters via model inversion 
after training. The advantage is that no retraining is required 
when the task objective changes. Moreover, our approach 
can be applied to near-arbitrary skill representations beyond 
DMPs and allows the optimization of parameters for complex 
programs composed of multiple skills. 


III. DEFINITIONS & PROBLEM STATEMENT 


A. Definition: Skill-based Robot Program 


We define a skill-based robot program as a directed acyclic 
graph (DAG) of skills, where each skill is defined as a 
function f : ¥ x S —> S x Y with the space of the skill’s 
exposed input parameters X, S the state space comprising the 
current poses of coordinate frames relevant to the task and 
Y the trajectory space comprising TCP poses and wrenches. 
This definition allows to treat a skill as a black box which 
maps a vector of inputs x € ¥ and prior state Sin E S to 
posterior state Sout E S and trajectory Y € YV. It covers 
DMPs [1], generalized manipulation strategies [3] or any 
skill variant which takes some inputs x and moves the robot. 
This definition allows to design a universal differentiable 
shadow model architecture capable of representing them. 


B. Problem Statement: Robot Program Parameter Inference 


Sequentially chained skills form complex robot programs 
by propagating the posterior state S; out of the ith skill in the 
sequence to the prior state Si+1,in of the subsequent skill. 
We postulate the Markov property, i.e. all relevant context 
information is captured in the start state Sin. Such a skill- 
based robot program realizes the function P : Sx ¥" > Sx 
V. We seek to optimize a task-dependent objective function 
® : Y — R, which assigns a real-valued score to a trajectory. 
For a program containing a spiral search skill, ® might assign 
high scores to trajectories which exhibit the characteristic 
drop in forces indicating that the hole has been found. For a 
given program P and initial state s;,,, we seek to solve the 
inverse problem x* = argmax,cyn ®(P(Sin, £)), ie. to 
find skill parameters which maximize ®. In the spiral search 
example, x* contains the velocities, accelerations, spiral 
extents and force setpoints which maximize the likelihood 
of the hole being found. Due to the high dimensionality of 
the combined input space X” of complex programs, learning 
to directly compute x* would require prohibitive amounts 
of supervised training data. We propose to instead train a 
differentiable model P (called shadow program) of P, and 
to iteratively approximate «* by gradient descent over Ê. 


IV. PARAMETER INFERENCE VIA SHADOW PROGRAM 
INVERSION 


We propose a three-step process to infer skill parameters 
for a skill-based robot program: (1) Construction of a differ- 
entiable surrogate (shadow program, P), which predicts the 
expected trajectory (TCP poses and wrenches) given input 
parameters for all skills in the program; (2) unsupervised 
training of the learnable components of this surrogate; and 
(3) inference of optimal skill parameters via gradient-based 
model inversion (cf. figure [ip. 


A. Shadow Model & Shadow Program Architecture 


For a given skill-based robot program (such as a sequence 
of DMPs) P, whose parameters are to be inferred, we begin 
by constructing the differentiable shadow program P. To 
that end, we instantiate a differentiable shadow model for 
each skill in P. Because we only consider skills which meet 
definition we can propose one single, differentiable 
shadow model architecture which can model any skill. This 
architecture is illustrated in figure [2| 
A shadow model of the 2-th skill in a skill-based robot 
program is a differentiable computational graph which com- 
putes the expected trajectory Y; for a given input vector 
x; and start state s;,,;. We represent Y; as a sequence of 
samples (Yi)e, 0<t< lYil, each of which contains the 
current success probability Psuce € [0,1], the probability 
pros € [0,1] of the action being completed at time t, as 
well as the TCP pose and wrench at time t. In figure 
x, contains the input parameters of a force-controlled spiral 
search skill, and s;,,; is set to the final state Sout,i—1 of 
the previous skill. The inclusion of Psucc in Y permits the 
inference of skill parameters with respect to skill-specific 
success metrics. For a spiral search, it allows to specify an 
objective function to maximize the probability of finding the 
hole (cf. equation [3] and experiment |V-C). 

Echoing work in OP integrating algorithmic priors and deep 
learning [30], [24], we do not predict the expected trajectory 
Y; end-to-end. Instead, we bootstrap an initial prior trajectory 
estimate Y; using a differentiable motion bamaf which 
produces a crude approximation of the trajectory without 
interactions with the environment such as moving objects or 
applying contact forces. We found that explicitly incorporat- 
ing prior knowledge greatly reduced the amount of training 
data required, particularly for long trajectories. For skills for 
which no such algorithmic prior exists, we instead use a 
generative neural network to bootstrap a prior trajectory from 
x; and Sin, [34]. The posterior trajectory Y is the sum of 
the prior and the output of a deep residual Gated Recurrent 
Unit (GRU) [35], which predicts the residual trajectory 
Pesi containing the context-dependent information about 
interactions with the environment, such as (expected) forces 
and torques. For the spiral search skill, for example, the 
residual GRU learns to predict when and where a hole is 


3For our experiments (see section V} we implemented simple differen- 
tiable planners for linear motions, spiral motions and gripper motions by 
reimplementing parts of orocos-kdl [31] and urdfpy [32] with PyTorch [33]. 
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Fig. 2: A shadow model of a spiral search skill as part of a 
larger shadow program. 


likely to be found, and how the expected trajectory then 
deviates from the prior Y;. 

The shadow model architecture is sufficiently flexible to 
model simple and complex skills with varying numbers and 
types of parameters. Note that the “signature” (layout of the 
parameter vector x) of a shadow model exactly matches that 
of the skill it models. This permits the transfer of the inferred 
parameters back to the original skill after inference (cf. 
[C}. Aside from the optional differentiable motion planner, 
which will differ from skill to skill, the proposed shadow 
model architecture can model any skill which meets the 
skill definition in section This allows the automatic 
construction of a shadow program for any complex skill- 
based robot program by instantiating a shadow model for 
each skill, and connecting the posterior state Sout of each 
shadow model to the prior state Sin of the next. 


B. Unsupervised Shadow Model Learning 


Because shadow models are forward models of semi- 
symbolic skills, they can be trained end-to-end on tuples 
(£, Sin, Y Fito minimize the mean prediction error 
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4Tn our implementation, Sout can be deterministically computed from the 
output trajectory Y and does not need to be learned explicitly. 
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Fig. 3: SPI (right) permits inference of skill parameters (xo 
and xı) w.r.t. task objectives Ga for non-differentiable robot 
programs (left). 


loss Lace for Psuce and prog, and the pointwise angle 
between quaternion-encoded TCP orientations 


Lori(Z, y) = C08” *(2(Goris Yori)” — 1). (2) 


(qi, q2) denotes the inner product of quaternions qı and q2. 
(Y )n,pos denotes the position component of the n-th point on 
trajectory Y. Training data can be collected autonomously 
by sampling inputs a and initial states Si», executing the 
original (non-differentiable) skills and recording the resulting 
trajectories. In real-world settings, in which programs are 
executed repeatedly over long periods of time, this permits 
the efficient use of readily available unsupervised data and 
facilitates automatic finetuning of the model as new obser- 
vations become available. Because of the Markov property, 
shadow models can be trained separately from one another, 
conditional on the start state s;,,, to form a library of trained, 
differentiable shadow models. 


C. Gradient-based Shadow Program Inversion 


To infer optimal parameters for a given skill-based robot 
program, we construct the corresponding shadow program 
by sequentially chaining trained shadow models via Sin and 
Sout (see figure B}. The shadow program is a differentiable 
graph-of-graphs; its differentiability permits the efficient, 
gradient-based optimization of the input parameters æ;,0 < 
i < n of all n skills of in the program with respect to 
an arbitrary, differentiable objective function Gg for task 
objectives ®. To perform this optimization, we use Neu- 
ral Network Iterative Inversion (NNID [36]: We randomly 
initialize the x;, perform a forward pass and compute the 
loss Go(Y). We exploit the differentiability of the program 
graph to backpropagate the gradients OGe(¥) and update the 
x, according to the Adam update rule [37]. Iterating until 
convergence yields the x; which minimize Go. 

With differentiability w.r.t. Y the sole requirement, a wide 
range of objective functions can be applied. In an industrial 
context, typical optimization targets include process metrics 
such as cycle time (Geycie), failure rate (G fai) or path length 
(Gpatn). The inclusion of meta information pros and Peuce 
in Y permits the succinct expression of the corresponding 
objective functions: 


1 [Ê] 


Great?) = —mex(0,min( YP neat) © 


| | n=l 


IY | 
Geye(¥) = S (1-0 amasar) 
n=1 
1 |¥|-1 
an(Y) = — Y)n os — (¥ as 
Spal?) = DO (IP naes = Posl 


+ LY )n+1,ori; (Y)n,ori)) 


g is the sigmoid function, T a constant (here T = 100) and 
Lori as defined in eqn.|2| Joint optimization of multiple met- 
rics at the same time can be realized by linear combination. 
SPI as described above has several properties which make 
it both theoretically attractive and practically applicable in 
real-world scenarios: 

1) Asymptotic optimality: If Ge is the objective 
function of the equivalent minimization problem to 
argmax,, ®(P(s;,,x)), x will approximate the optimal pa- 
rameters «*, provided Gg is convex, P is faithfully approxi- 
mated by the shadow program, gradients are bounded and the 
learning rate is small [37]. In practice, near-optimal solutions 
can be reached in a few hundred iterations. 

2) Separation of learning from inference: Because the 
learning problem is reduced to learning a forward model of 
the program, parameter inference is decoupled from training 
and therefore very fast. Individual shadow models can be 
trained offline and combined to arbitrary shadow programs 
at inference time. Parameter inference itself does not require 
additional training, exploration or expensive policy search. 

3) Zero-shot generalization: By extension, our approach 
permits parameter inference with respect to arbitrary objec- 
tive functions without requiring additional training examples. 
The same robot program can be optimized for different 
task objectives ® by simply changing the loss function Go 
accordingly and rerunning NNII. 


V. EXPERIMENTS 


To evaluate our approach in a wide variety of real-world 
applications, we apply SPI to infer program parameters 
from human demonstrations for pick-and-place tasks in a 
household scenario, impact force optimization for contact 
motions and the inference of spiral search heuristics in the 
context of electronics assembly. 


A. Parameter Inference for Complex Task Objectives 


In this experiment, we demonstrate the capacity of SPI 
to infer parameters with respect to complex task objectives 
from scratch. To that end, a household task of picking up a 
glass and depositing it in a sink is considered. Given only a 
program structure (a sequence of unparameterized skills), a 
set of parameters is to be inferred which closely approx- 
imates a human demonstration of the task. The program 
consists of a linear approach motion, opening the gripper, 
a sequence of 3 linear transfer motions, a skill to close the 
gripper and a linear depart motion}}| Skill parameters are 


5The skill representation for which parameters are inferred in experiments 
and is the ArtiMinds Robot Task Model (ARTM) [38], 
a non-differentiable industrial implementation of generalized manipulation 
strategies [3]. 
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Fig. 5: Experiment Optimization of contact motions 
(bottom left). Each cell shows the mean deviation from Fyoa1 
over 250 optimizations and the improvement over the initial 
parameterization. Right: Convergence behavior of SPI for the 
velocity parameter and resulting force trajectories for Fyoai 
= 5 N for a linear motion ARTM skill. 


initialized randomly and comprise the goal poses, velocities 
and accelerations of the linear motions as well as the target 
joint state and velocity of the gripper skills. Program param- 
eters were inferred to minimize a combination of pointwise 
distance Gq between the TCP and the demonstration as well 
as the demonstrated and predicted hand openings, and a grasp 
penalty G, enforcing additional precision during grasps. 
Minimizing Gy and G, is a challenging optimization problem 
because the dynamics of the demonstration and predicted 
trajectories are vastly different, and SPI must implicitly 
adapt the velocities and accelerations of the skills first in 
order to make the predicted trajectories comparable to the 
demonstration. Four human demonstrations were collected in 
virtual reality (VR) using the KnowRob framework [39]. For 
this use case, the gripper state was included in S and V. Real- 
world experiments were conducted using a Universal Robots 
URS industrial manipulator and a Robotiq 2FG-85 parallel 
gripper. Results are shown in figure |4| For each of the four 
demonstrations, parameter inference results in a robot motion 
which closely approximates the human demonstrations, but 
obeys the constraints such as linearity imposed by the skills. 
The results testify to the capacity of SPI to jointly infer skill 
inputs for realistic robot programs, even if the initial program 
parameters are far from the optimum. 


LB 


Rubber PCB Foam 
Foal 5N 10N 20N 1N 5N 8N 1N 15N 2N 
. 143 1.63 2.76 0.16 0.68 0.95 0.24 0.15 0.16 
URScript 
-75% -84% -84% -92% -71% -80% -36% -51% -74% 
DMP 0.56 0.65 2.55 0.14 0.18 0.26 0.16 0.21 0.17 


-96% -94% -85% -90% -93% -95% -60% -54% -718% 


Fig. 6: Experiment Optimization of contact motions 
for different skill frameworks and surfaces. Each cell shows 
the mean deviation from F}oa; over 100 optimizations from 
random initial parameters (in N) and the improvement of 
this error over the initial parameterization. 


B. Force-Sensitive Manipulation Without Expert Knowledge 


1) Data-driven optimization of contact forces: To evaluate 

our approach in the context of industrial manipulation, we 
consider the task of touching a surface with a specific 
impact force. In a first series of tests, we use SPI to 
optimize the motion direction, velocity v and acceleration 
a of a linear contact motion skill, which moves the robot 
in a given direction until a force Fyoqi is registered. 
Contact motions are difficult to parameterize manually 
because the true contact force Feontact 18 determined by 
a spring-mass-damper system composed of the robot and 
the contact surface with unknown dampening and spring 
characteristics. Fgoa] merely imposes a lower bound on the 
maximum force, with v and a determining the true force on 
contact (cf. figure |5| (gray)). A shadow model was trained 
on 50000 simulated and 500 real executions with randomly 
sampled values of v and a. The task objective consisted 
of a linear combination of Gcycle and the mean squared 
error between the predicted contact force and Fijoqi. We 
ran SPI for goal forces between 3 and 7 N, collecting 250 
inferred parameterizations for each goal force and randomly 
initializing v and a. A total of 1250 optimized programs 
were executed on a Fanuc LR Mate 200iD/7L manipulator 
and FS-15iA force-torque sensor (FTS). 
For goal forces between 3 and 5 N, the optimized 
parameterizations produce maximum contact forces very 
close to the target force (cf. figure bottom left), 
demonstrating the capability of our approach to zero-shot 
generalize across task objectives (in this case, different 
values of Fo oqi). Figure (top right) illustrates the 
convergence behavior of our optimizer for a goal force of 5 
N, which converges on a globally optimal velocity in under 
40 NNII iterations regardless of the initial parameterization. 
Figure [5] (bottom right) shows the force trajectories resulting 
from executing the 250 resulting parameterizations. 


2) Generalization to different skill representations: In a 
second series of experiments, we use SPI over shadow skills 
to parameterize low-level primitives with respect to three dif- 
ferent surfaces with very different dampening characteristics. 
To illustrate the universality of SPI, we optimize the target 
pose, velocity and acceleration parameters of the movel 
URScript primitive [6] as well as the temporal scaling 
parameter 7 and the target pose of a linear discrete DMP 
[40] to establish f,,,3. The experiments were conducted on 
a Universal Robots URSe. The results summarized in table 
[6] show that the inferred parameterizations produce contact 
forces well within 0.25 N of the goal on most surfaces, 
which is in the order of sensor noise. For both DMPs and 
URScript skills, SPI could adapt parameters to dampening 
characteristics ranging from near-linear (foam) to highly 
nonlinear (rubber) for a wide range of contact forces. 


C. Zero-Shot Generalization Across Task Objectives 


To illustrate the capacity of SPI to zero-shot generalize 

across task objectives, we consider a further use case of 
finding the position of a set of holes on a printed circuit board 
(PCB) for the insertion of electronics components. In prac- 
tice, manufacturing tolerances cause stochastic deviations 
from the expected hole positions on the order of millimeters, 
requiring the use of force-controlled search motions. An pro- 
gram structure to solve this task consists of a linear approach 
motion followed by a force-controlled spiral search (cf. figure 
B}. The spiral search skill accepts inputs w, and w, defining 
the extents of the spiral motion along its principal axes, the 
distance d between spiral arms, force runtime constraints 
Fmin, Fmaz, position goal constraints Zmin, Zmax, Velocity 
v and acceleration a and performs a spiral motion in the 
xy-plane of the TCP, maintaining a force between Fmin and 
Fmax along the z-axis of the TCP, succeeding if a depth 
between Zmin and Zmaz can be reached. Shadow models for 
both skills were pre-trained on 50000 simulated executions 
and fine-tuned on 2500 real samples using a Fanuc LR Mate 
200iD/7L and FS-15iA FTS. We collected two baseline test 
datasets of 250 samples each, one with randomly initial- 
ized input parameters and one parameterized by a human 
expert. Parameter inference was conducted from initial input 
parameters set to the respective baseline parameterization. To 
demonstrate zero-shot generalization across task objectives, 
program parameters were optimized with respect to Geycle, 
Grails Gpath (cf. A} as well as linear combinations. 
The optimized parameterizations consistently yield improve- 
ments for their corresponding metrics (cf. figure p). Com- 
pared to the already robust human expert parameterization, 
the optimized parameterization nearly eliminates failures 
altogether. Joint parameter inference with respect to a combi- 
nation of task objectives yields gains in both metrics. Figure 
[7] shows examples for spiral motions resulting from the op- 
timized policies. Optimization with respect to different task 
objectives results in fundamentally different search policies, 
such as a “fail fast” policy for minimizing path length or a 
very robust policy for minimizing failure rate and cycle time 
which near-optimally fits the hole distribution. 


waodg 4 146 (aodxqq) outjaseg 


ao 
= i A 
5 2 
f 
À 1.00 ‘ 
2 5 1.00 £ Baseline (Ex i oe [eae sai] 
x 0.50 
3 g 0.50 
2 
= 0.25 = 

= 3° 25 
2 | 

j 0 


Path ll | ll Iii Rate Path i li ll Le Rate 
Task Objective (Go): ME Gj; WE Goan ME Geycie ME Gfai + Gpatn 


o 
© f= Baseline & --- Baseline 
$ —— Optimized 9 © > Optimize: 
8 Hole distribution & Hole distribution 


Pos. (Y) [m] 
o o 
O 
ao R 
i ~. 
j 7 Ly 
a 
1 [~ 
L s ; 
i 
i rei 
S 
o, 2 
On “On “O 
D 
A A 
ra -~ 
, 


o N e. 
Ñ g 
& |Geyete || Ò 2 [Grait + Geyce 
0476 0477 0478 0479 $ 0.476 0.477 0478 0479 
Pos. (X) [m] Pos. (X) [m] 


Fig. 7: Experiment Top: Stochastic variations of the 
hole position cause a manually parameterized spiral search 
to fail, while an inferred parameterization is more robust. 
Middle: Process metrics for optimized programs relative to 
random (left) and expert (right) baselines (log scale). Bottom: 
Spiral search policies for different objective functions. 


VI. CONCLUSION AND OUTLOOK 


We present an approach for inferring the input param- 
eters of skill-based robot programs by a combination of 
unsupervised learning and gradient-based iterative model 
inversion. To our knowledge, this is the first application of 
differentiable programming and NNII to the skill parameteri- 
zation problem in robotics. We show that SPI can effectively 
infer optimal parameters for robot programs composed from 
non-differentiable skill frameworks. Application to force- 
sensitive contact motions and search heuristics for electronics 
assembly demonstrate its capability to adapt parameters 
to nonlinear system dynamics or stochastic process noise. 
Zero-shot generalization across task objectives and exclusive 
reliance on unsupervised training establish SPI as a powerful 
solution for parameter inference in real-world use cases. 
Like all first-order optimization methods, the performance of 
SPI is conditional on the topology of the objective function. 
We are investigating the possibility of augmenting NNII 
by hessian-free optimization to further improve performance 
[41]. We further observe that reinforcement learning (RL)- 
based robot learning approaches require a solution to locally 
optimize skill parameters [42], [43], motivating further in- 
quiry on synergies between SPI and RL for robot learning. 
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